Principal Component Analysis and Higher Correlations for Distributed Data

نویسندگان

  • Ravi Kannan
  • Santosh Vempala
  • David P. Woodruff
چکیده

We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We present algorithms for two illustrative problems on massive data sets: (1) computing a low-rank approximation of a matrixA = A+A+. . .+A, with matrix A stored on server t and (2) computing a function of a vector a1+a2+. . .+as, where server t has the vector at; this includes the well-studied special case of computing frequency moments and separable functions, as well as higher-order correlations such as the number of subgraphs of a specified type occurring in a graph. For both problems we give algorithms with nearly optimal communication, and in particular the only dependence on n, the size of the data, is in the number of bits needed to represent indices and words (O(log n)).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Faults and fractures detection in 2D seismic data based on principal component analysis

Various approached have been introduced to extract as much as information form seismic image for any specific reservoir or geological study. Modeling of faults and fractures are among the most attracted objects for interpretation in geological study on seismic images that several strategies have been presented for this specific purpose. In this study, we have presented a modified approach of ap...

متن کامل

Development of a cell formation heuristic by considering realistic data using principal component analysis and Taguchi’s method

Over the last four decades of research, numerous cell formation algorithms have been developed and tested, still this research remains of interest to this day. Appropriate manufacturing cells formation is the first step in designing a cellular manufacturing system. In cellular manufacturing, consideration to manufacturing flexibility and productionrelated data is vital for cell formation....

متن کامل

Analysis of the Relationship between Distributed Leadership Style and Organizational Effectiveness of High Schools in Hamadan

The main objective of this study was to investigate the relationship between Distributed Leadership and Organizational Effectiveness of high schools in Hamadan city. The research method was descriptive-correlation. The statistical population included all high school teachers of Hamadan in the academic year of 2015. Based on classical random sampling, using Krejcie and Morgan chart, 335 teachers...

متن کامل

Relationship between Some Environmental Factors with Distribution of Medicinal Plants in Ghorkhud Protected Region, Northern Khorasan Province, Iran

Medicinal plant species constitute a considerable part of the flora in Iran and play a major role in the composition of plant communities. Therefore, it is necessary to recognize the factors leading to the establishment and distribution of vegetation. For data sampling (2012), land units were specified. The plot size was determined using minimal area method and number of plots was determined by...

متن کامل

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014